Skip to main content

More right-solid
More right-solid
Show sorted alphabetically
More right-solid
Show sorted alphabetically
More right-solid
SHOW DETAILS
up-solid down-solid
eye
Title
Date Added
Review
Software Library: MS-DOS Games
software
eye 2,112
favorite 11
comment 0

Tony & Friends in Kelloggs Land - Promotional Game Platformer 90s MSDOS VGA Good/Decent Graphics & Music Dosbox; cycles=15000 or cycles=max some issues with screen performance sometimes game is playable (not fully tested) Game is in German, Does not really affect gameplay
Topics: promo, promogame, free, kellogg, kellogg´s, kelloggs, msdos, dos, platformer, jump, scroller, sb,...

Bulk Bibliographic Metadata
Jan 24, 2018
data
eye 585
favorite 1
comment 0

This file is a snapshot dump of the Crossref DOI metadata API, containing entries for over 94 million DOIs. Compared to the previous 2017-03 version (see archive.org item "crossref_doi_dump_201703"), this snapshot has a few million more works, but the corpus size is much larger (29 GB compressed vs. 7 GB compressed) as it now contains significantly more citation data, due to the efforts of the Initiative for Open Citations (I4OC) project. This was generated by running the scripts...

Internet Archive Research Publication Crawls
Internet Archive Research Publication Crawls
collection
19,726
ITEMS
17.6M
VIEWS
Dec 19, 2017
collection
eye 17.6M

A series of open web crawls targeting journal articles, technical memos, essays, datasets, and other research publications. This collection contains WARC and CDX files that end up in Wayback ( https://web.archive.org ). See also bibliographic metadata corpuses at  https://archive.org/details/ia_biblio_metadata

Internet Archive crawldata from Survey Webwide Crawl, captured by crawl339.us.archive.org:survey from Wed Sep 13 11:45:12 PDT 2017 to Fri Sep 15 20:22:07 PDT 2017.
Topic: crawldata

Community Images
Aug 25, 2017
image
eye 743
favorite 2
comment 0

Favicons are the (usually tiny) image files that browsers may use to represent websites in tabs, in the URL bar, or for bookmarks. This dataset contains about 360,000 favicons from popular websites. These favicons were scraped in July 2016. I wrote a crawler that went through Alexa's top 1 million sites, and made a request for 'favicon.ico' at the site root. If I got a 200 response code, I saved the result as ${site_url}.ico. For domains that were identical but for the TLD (e.g. google.com,...
Topics: images, icons, internet

CD-ROM Software Library
Aug 15, 2017
software
eye 14,606
favorite 19
comment 1

RollerCoaster Tycoon 3 Deluxe Edition (Europe): RollerCoaster Tycoon 3 RollerCoaster Tycoon 3: Wild! RollerCoaster Tycoon 3: Soaked!
Topic: RollerCoaster Tycoon 3

Community Software
May 3, 2017
software
eye 270
favorite 2
comment 0

Play the part of the Evil Overlord as you make your way through the land, defeating Heroes and bringing Doom with you as you go!  Awesome game, fun to play--hysterical narration, be sure to listen closely :) 
Topic: Dungeon Keeper Gold game play fun funny

The Archive Team Just In Time Grabs
Jan 10, 2017
web
eye 30
favorite 2
comment 0

2017 Archive.org Census Identifiers

WARCZone: Outsider WARCs
Nov 20, 2016
web
eye 74,063
favorite 1
comment 0

home.arcor.de is going to be closed on January 31th 2017. This item contains a best-effort grab of the user’s sites. Each WARC was seeded with 1000 users and contains all assets required to display the sites (span-hosts).
Topics: arcor, isp hosting, archiveteam

CD-ROM Images
Sep 15, 2016
software
eye 1,075
favorite 1
comment 0

French copy of the video game Mob Rule, also known as Street Wars and as Constructor Underworld.
Topics: studio 3, mob rule, constructor, street wars

Community Software
May 14, 2016
software
eye 302
favorite 2
comment 0

http://archiveteam.org/index.php?title=Internet_Archive_Census An unofficial attempt to count and account for the files available on the Internet Archive, both directly downloadable, public files and private files that are available through interfaces like the Wayback Machine or the TV News Archive. The purpose of this project is multi-fold, including collections of the reported hashes of all the files, determination of sizes of various collections, and determining priorities in backing up...
Topic: IA Census
Source: torrent:urn:sha1:d5f9909f56f14867ca2e7a925cb1dadbb2a3da49

Data Collection
Mar 15, 2016
data
eye 146
favorite 2
comment 0

Software Library: MS-DOS Games
Feb 20, 2016
software
eye 336,301
favorite 218
comment 23

Platforms DOS, Windows Published by Blue Byte Software GmbH Released 1997 Genre Compilation Description The Settlers II (Gold Edition) contains: The Settlers II: Veni, Vidi, Vici The Settlers II Mission CD A full world atlas Contest entries of 130 fan-made custom maps From Mobygames.com. Original Entry
Topics: msdos, game

The Dataset Collection
favoritefavoritefavoritefavoritefavorite Jul 9, 2015
data
eye 51,097
favorite 9
comment 3
favoritefavoritefavoritefavoritefavorite

Find the dataset available for instant analysis in BigQuery and queries on this reddit...

(Here is the original Reddit comment announcing this collection of data and what the processes were.) This is an archive of Reddit comments from October of 2007 until May of 2015 (complete month). This reflects 14 months of work and a lot of API calls. This dataset includes nearly every publicly available Reddit comment. Approximately 350,000 comments out of ~1.65 billion were unavailable due to Reddit API issues. Q: How are the files structured? Each file is compressed with bzip2 compression....

Find the dataset available for instant analysis in BigQuery and queries on this reddit...

The Dataset Collection
Apr 1, 2014
software
eye 1,499
favorite 1
comment 0

All the "journal article" DOIs from CrossRef's OAI-PMH server; URLs of just under 50 million journal articles.
Topics: doi, dataset

The Archive Team Just In Time Grabs
Feb 27, 2014
web
eye 584
favorite 1
comment 0

This is a panic grab of http://archive.is/alldomains .
Topics: archiveteam, archive.is, panicgrab

CD-ROM Images
Dec 7, 2013
software
eye 7,818
favorite 5
comment 1

Sim City 2000 v1.0 (1994)(Maxis)

The Dataset Collection
The Dataset Collection
collection
3,233
ITEMS
419,805
VIEWS
Jun 19, 2013
collection
eye 419,805

The Dataset Collection consists of large data archives from both sites and individuals.

Archive Team: Preposterous! The Posterous Grab
texts
eye 8,322
favorite 1
comment 0

Posterous Hostname List

Internet Census 2012
Internet Census 2012
collection
15
ITEMS
2,504
VIEWS
May 14, 2013
collection
eye 2,504

Abstract While playing around with the Nmap Scripting Engine (NSE) we discovered an amazing number of open embedded devices on the Internet. Many of them are based on Linux and allow login to standard BusyBox with empty or default credentials. We used these devices to build a distributed port scanner to scan all IPv4 addresses. These scans include service probes for the most common ports, ICMP ping, reverse DNS and SYN scans. We analyzed some of the data to get an estimation of the IP address...

Survey Crawls
Survey Crawls
collection
100,903
ITEMS
8.2B
VIEWS
Nov 17, 2012
collection
eye 8.2B

Survey crawls are run about twice a year, on average, and attempt to capture the content of the front page of every web host ever seen by the Internet Archive since 1996.
Topic: survey crawls

Alexa Crawls DE
Alexa Crawls DE
collection
138
ITEMS
37.8M
VIEWS
Jul 11, 2012
collection
eye 37.8M

Crawl data donated by Alexa Internet. This data is currently not publicly accessible

Community Audio
Jun 11, 2012
audio
eye 59
favorite 1
comment 0

0:00 ZX Spectrum Orchestra - Beepulator 3:41 Haus Arafna - Last Dream Of Jesus 7:31 Fe-Mail featuring Lasse Marhaug - Charmed 13:10 Cloverleaf - Anhedonia 15:20 :zoviet*france: - Angel's Pin Number 20:57 Grails - Black Tar Prophecy 28:10 Sister Iodine - Whitebread Column 30:54 OOIOO - UJA 38:39 Jazkamer - Metal Music Machine 42:49 Byrd E. Bath - Good Old Fashioned Balls

Wayback Indexes
Wayback Indexes
collection
554
ITEMS
936M
VIEWS
Apr 4, 2012
collection
eye 936M

Wayback indexes. This data is currently not publicly accessible.

Common Crawl
Common Crawl
collection
13,251
ITEMS
127.1M
VIEWS
Mar 31, 2012
collection
eye 127.1M

Web crawl data from Common Crawl.

Classic PC Games
Classic PC Games
collection
15,562
ITEMS
12.7M
VIEWS
Feb 15, 2012
collection
eye 12.7M

Take a step back in time and revisit your favorite DOS and Windows games. The files available in this collection consist primarily of PC demos, freeware, and shareware. These files are the original releases which will require intermediate to advanced knowledge to install and run on modern operating systems.  Where possible online play is enabled to enjoy the game directly in your browser. New files are added to this collection on a regular basis. Specific news regarding major updates...
Topics: PC Games, Vintage computer games, Windows games, DOS games

Top Domains
collection
172,383
ITEMS
2.1B
VIEWS
Nov 29, 2011
collection
eye 2.1B

A daily collection of thousands of the most popular web sites according to Alexa.com's top sites rankings .
Topics: daily, popular sites, Alexa

Focused Crawls
Focused Crawls
collection
339,422
ITEMS
3.2B
VIEWS
Nov 4, 2011
collection
eye 3.2B

Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
Topic: webcrawl

WikiTeam
WikiTeam
collection
262,251
ITEMS
1.2M
VIEWS
Aug 2, 2011
collection
eye 1.2M

WikiTeam software is a set of tools for archiving wikis. They work on MediaWiki wikis, but we want to expand to other wiki engines. As of January 2020, WikiTeam has preserved more than 250,000 wikis , several wikifarms, regular Wikipedia dumps and 34 TB of Wikimedia Commons images . About WikiTeam There are thousands of wikis in the Internet. Every day some of them are no longer publicly available and, due to lack of backups, lost forever. Millions of people download tons of media files...
Topic: wikis

Wikipedia Outlinks
Wikipedia Outlinks
collection
73,961
ITEMS
1.4B
VIEWS
May 13, 2011
collection
eye 1.4B

Crawl of outlinks from wikipedia.org . These files are currently not publicly accessible. from Wikipedia : Wikipedia is a multilingual, web-based, free-content encyclopedia project operated by the Wikimedia Foundation and based on an openly editable model. The name "Wikipedia" is a portmanteau of the words wiki (a technology for creating collaborative websites, from the Hawaiian word wiki, meaning "quick") and encyclopedia. Wikipedia's articles provide links to guide the...

Live Web Proxy Crawls
Live Web Proxy Crawls
collection
108,892
ITEMS
7.8B
VIEWS
Apr 26, 2011
collection
eye 7.8B

Content crawled via the Wayback Machine Live Proxy mostly by the Save Page Now feature on web.archive.org. Liveweb proxy is a component of Internet Archive’s wayback machine project. The liveweb proxy captures the content of a web page in real time, archives it into a ARC or WARC file and returns the ARC/WARC record back to the wayback machine to process. The recorded ARC/WARC file becomes part of the wayback machine in due course of time.

Custom Crawl Services
Custom Crawl Services
collection
126,691
ITEMS
1.4B
VIEWS
Apr 8, 2011
collection
eye 1.4B

Large-scale web harvests and national domain crawls performed for National Libraries, National Archives, preservation partners, research initiatives, and as part of special projects and custom crawling and research services.
Topic: ccs

urlteam (301works.org Archive)
collection
129
ITEMS
1,629
VIEWS
Mar 15, 2011
collection
eye 1,629

Shortened URL archive

Alexa Crawls
Alexa Crawls
collection
226,901
ITEMS
12.4B
VIEWS
Nov 16, 2010
collection
eye 12.4B

Starting in 1996, Alexa Internet has been donating their crawl data to the Internet Archive. Flowing in every day, these data are added to the Wayback Machine after an embargo period.
Topics: web crawl, Alexa

Internet Archive Photo and Video Collection
Internet Archive Photo and Video Collection
collection
37
ITEMS
37,254
VIEWS
Oct 18, 2010
collection
eye 37,254

photos and video from Internet Archive events
Topic: internetarchivedump

Internet Archive Presents
Internet Archive Presents
collection
879
ITEMS
10.6M
VIEWS
Oct 11, 2010
collection
eye 10.6M

Presentations and events at the Internet Archive.
Topic: collection

Web Crawls
Web Crawls
collection
16,499,631
ITEMS
63B
VIEWS
Oct 8, 2010
collection
eye 63B

The Web Archive of the Internet Archive started in late 1996, is made available through the Wayback Machine , and some collections are available in bulk to researchers. Many pages are archived by the Internet Archive for other contributors including partners of Archive-IT , and Save Page Now users. Other captures are donated to the Internet Archive by other partners such as Alexa Internet .
Topic: Web Archive

Worldwide Web Crawls
Worldwide Web Crawls
collection
627,440
ITEMS
13.4B
VIEWS
Oct 5, 2010
collection
eye 13.4B

Wide crawls of the Internet conducted by Internet Archive. Please visit the Wayback Machine to explore archived web sites. Since September 10th, 2010, the Internet Archive has been running Worldwide Web Crawls of the global web, capturing web elements, pages, sites and parts of sites. Each Worldwide Web Crawl was initiated from one or more lists of URLs that are known as "Seed Lists". Descriptions of the Seed Lists associated with each crawl may be provided as part of the metadata for...

Web Server Logs
Web Server Logs
collection
2,487
ITEMS
14,671
VIEWS
Jun 16, 2010
collection
eye 14,671

Server logs from archive.org. Usage logs, from the webservers of the Internet Archive and the Wayback Machine.
Topic: webserverlogs

Internet Archive Web Crawls
Internet Archive Web Crawls
collection
1,521,602
ITEMS
32.4B
VIEWS
Jun 11, 2010
collection
eye 32.4B

The Internet Archive discovers and captures web pages through many different web crawls. At any given time several distinct crawls are running, some for months, and some every day or longer. View the web archive through the Wayback Machine .
Topic: webwidecrawl

301Works.org
301Works.org
collection
8,358
ITEMS
509,256
VIEWS
Oct 8, 2009
collection
eye 509,256

301works.org 301Works.org is an independent service for archiving URL mappings. The goal of the service is to provide protection for every day users of short URL services by providing transparency and permanence of their mappings. Shortened URL archives are in accordance with 301Works.org membership terms. Items contained in the archives are not publicly accessible at this time. 301Works Frequently Asked Questions
Topic: 301works